Search Result

Select

Long text classification combined with attention mechanism

LU Ling, YANG Wu, WANG Yuanlun, LEI Zijian, LI Ying

Journal of Computer Applications 2018, 38 (5): 1272-1277. DOI: 10.11772/j.issn.1001-9081.2017112652

Abstract （2588）

PDF （946KB）（1132）

Save

News text usually consists of tens to hundreds of sentences, which has a large number of characters and contains more information that is not relevant to the topic, affecting the classification performance. In view of the problem, a long text classification method combined with attention mechanism was proposed. Firstly, a sentence was represented by a paragraph vector, and then a neural network attention model of paragraph vectors and text categories was constructed to calculate the sentence's attention. Then the sentence was filtered according to its contribution to the category, which value was mean square error of sentence attention vector. Finally, a classifier base on Convolutional Neural Network (CNN) was constructed. The filtered text and the attention matrix were respectively taken as the network input. Max pooling was used for feature filtering. Random dropout was used to reduce over-fitting. Experiments were conducted on data set of Chinese news text classification task, which was one of the shared tasks in Natural Language Processing and Chinese Computing (NLP&CC) 2014. The proposed method achieved 80.39% in terms of accuracy for the filtered text, which length was 82.74% of the text before filtering, yielded an accuracy improvement of considerable 2.1% compared to text before filtering. The emperimental results show that combining with attention mechanism, the proposed method can improve accuracy of long text classification while achieving sentence level information filtering.

Reference | Related Articles | Metrics

Select

Stance detection method based on entity-emotion evolution belief net

LU Ling, YANG Wu, LIU Xu, LI Yan

Journal of Computer Applications 2017, 37 (5): 1402-1406. DOI: 10.11772/j.issn.1001-9081.2017.05.1402

Abstract （516）

PDF （800KB）（429）

Save

To deal with the problem of stance detection of Chinese social network reviews which lack theme or emotion features, a method of stance detection based on entity-emotion evolution Bayesian belief net was proposed in this paper. Firstly, three types of domain dependent entities, including noun, verb-object phrase and verb-noun compound attributive centered structure were extracted. The domain-related emotion features were extracted, and the variable correlation strength was used as a constraint on the learning of the network structure. Then the 2-dependence Bayesian network classifier was constructed to describe the dependence of entity, stance and emotion features. The stance of reviews was deducted from combination condition of entities and emotion features. Experiments were tested on Natural Language Processing & Chinese Computing 2016 (NLP&CC2016). The experimental results show that the average micro-F reaches 70.8%, and average precision of FAVOR and AGAINST increases by 4.1 percentage points and 3.1 percentage points over Bayesian network classification method with emotion features only respectively. The average micro-F on 5 target data sets of evaluation reaches 62.3%, which exceeds average level of the evaluation.

Reference | Related Articles | Metrics

Select

Chinese short text classification method by combining semantic expansion and convolutional neural network

LU Ling, YANG Wu, YANG Youjun, CHEN Menghan

Journal of Computer Applications 2017, 37 (12): 3498-3503. DOI: 10.11772/j.issn.1001-9081.2017.12.3498

Abstract （520）

PDF （928KB）（870）

Save

Chinese news title usually consists of a single word to dozens of words. It is difficult to improve the accuracy of news title classification due to the problems such as few characters and sparse features. In order to solve the problems, a new method for text semantic expansion based on word embedding was proposed. Firstly, the news title was expanded into triples consisting of title, subtitle and keywords. The subtitle was constructed by combining the synonym of title and the part of speech filtering method, and the keywords were extracted from the semantic composition of words in multi-scale sliding windows. Then, the Convolutional Neural Network (CNN) model was constructed for categorizing the expanded text. Max pooling and random dropout were used for feature filtering and avoidance of overfitting. Finally, the double-word spliced by title and subtitle, and the multi-keyword set were fed into the model respectively. Experiments were conducted on the news title classification dataset of the Natural Language Processing & Chinese Computing in 2017 (NLP&CC2017). The experimental results show that, the classification precision of the combination model of expanding news title to triples and CNN is 79.42% in 18 categories of news titles, which is 9.5% higher than the original CNN model without expanding, and the convergence rate of model is improved by keywords expansion. The proposed expansion method of triples and the constructed CNN model are verified to be effective.

Reference | Related Articles | Metrics

Select

Automatic short text summarization method based on multiple mapping

LU Ling, YANG Wu, CAO Qiong

Journal of Computer Applications 2016, 36 (2): 432-436. DOI: 10.11772/j.issn.1001-9081.2016.02.0432

Abstract （424）

PDF （860KB）（915）

Save

Traditional automatic text summarization has generally no word count requirements while many social network platforms have word count limitation. Balanced performance is hardly obtained in short text summarization by traditional digest technology because of the limitation of word count. In view of this problem, a new automatic short text summarization method was proposed. Firstly, the values of relationship mapping, length mapping, title mapping and position mapping were calculated to respectively form some sets of candidate sentences. Secondly, the candidate sentences sets were mapped to abstract sentences set by multiple mapping strategies according to series of multiple mapping rules, and the recall ratio was increased by putting central sentences into the set of abstract sentences. The experimental results show that multiple mappings can obtain stable performance in short text summarization, the F measures of ROUGE-1 and ROUGE-2 tests are 0.49 and 0.35 respectively, which are better than the average level of NLP&CC2015 evaluation, proving the effectiveness of the method.

Reference | Related Articles | Metrics

Select

News recommendation method by fusion of content-based recommendation and collaborative filtering

YANG Wu, TANG Rui, LU Ling

Journal of Computer Applications 2016, 36 (2): 414-418. DOI: 10.11772/j.issn.1001-9081.2016.02.0414

Abstract （740）

PDF （678KB）（1500）

Save

To solve poor diversity problem of user interests in content-based news recommendation and cold-start problem in hybrid recommendation, a new method of news recommendation based on fusion of content-based recommendation and collaborative filtering was proposed. Firstly, the content-based method was used to find the user's interest. Secondly, similar user group of the target user was found out by using hybrid similarity pattern which contains content similarity and behavior similarity, and the user's potential interest was found by predicting the user's interest in feature words. Next, the user interest model with characteristics of personalization and diversity was obtained by fusing user's existed interest and potential interest. Lastly, the recommendation list was output after calculating the similarity of candidate news and fusion model. The experimental results show that, compared with the content-based recommendation methods, the proposed method obviously increases F-measure and Diversity; and it has equivalent performance with hybrid recommendation method, however it does not need time to accumulate enough user clicks of candidate news and has no cold start problem.

Reference | Related Articles | Metrics

Select

Micro-blog hot topics detection method based on user role orientation

YANG Wu LI Yang LU Ling

Journal of Computer Applications 2013, 33 (11): 3076-3079.

Abstract （651）

PDF （642KB）（429）

Save

To solve the low extraction efficiency for extracting hot topics in huge amounts of micro-blog data, a new topics detection method based on user role orientation was proposed. Firstly, some noise data of parts of users were filtered out by user role orientation. Secondly, the feature weight was calculated by the Term Frequency-Inverse Document Frequency (TF-IDF) function combined with semantic similarity to reduce the error caused by semantic expression. Then, the improved Single-Pass clustering algorithm was used to extract the topics of micro-blog. Lastly, the heat evaluation of micro-blog topics was made according to the number of reposts and comments, thus the hot topics were found. The results show that the average missing rate and false detection rate respectively decrease by 12.09% and 2.37%, and further indicate the topic detection accuracy rate is effectively improved and the method is feasible.